19 research outputs found
Classifying Options for Deep Reinforcement Learning
In this paper we combine one method for hierarchical reinforcement learning -
the options framework - with deep Q-networks (DQNs) through the use of
different "option heads" on the policy network, and a supervisory network for
choosing between the different options. We utilise our setup to investigate the
effects of architectural constraints in subtasks with positive and negative
transfer, across a range of network capacities. We empirically show that our
augmented DQN has lower sample complexity when simultaneously learning subtasks
with negative transfer, without degrading performance when learning subtasks
with positive transfer.Comment: IJCAI 2016 Workshop on Deep Reinforcement Learning: Frontiers and
Challenge
A Pragmatic Look at Deep Imitation Learning
The introduction of the generative adversarial imitation learning (GAIL)
algorithm has spurred the development of scalable imitation learning approaches
using deep neural networks. Many of the algorithms that followed used a similar
procedure, combining on-policy actor-critic algorithms with inverse
reinforcement learning. More recently there have been an even larger breadth of
approaches, most of which use off-policy algorithms. However, with the breadth
of algorithms, everything from datasets to base reinforcement learning
algorithms to evaluation settings can vary, making it difficult to fairly
compare them. In this work we re-implement 6 different IL algorithms, updating
3 of them to be off-policy, base them on a common off-policy algorithm (SAC),
and evaluate them on a widely-used expert trajectory dataset (D4RL) for the
most common benchmark (MuJoCo). After giving all algorithms the same
hyperparameter optimisation budget, we compare their results for a range of
expert trajectories. In summary, GAIL, with all of its improvements,
consistently performs well across a range of sample sizes, AdRIL is a simple
contender that performs well with one important hyperparameter to tune, and
behavioural cloning remains a strong baseline when data is more plentiful.Comment: Asian Conference on Machine Learning, 202
Recommended from our members
The Societal Implications of Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) is an avenue of research in Artificial Intelligence (AI) that has received increasing attention within the research community in recent years, and is beginning to show potential for real-world application. DRL is one of the most promising routes towards developing more autonomous AI systems that interact with and take actions in complex real-world environments, and can more flexibly solve a range of problems for which we may not be able to precisely specify a correct āanswerā. This could have substantial implications for peopleās lives: for example by speeding up automation in various sectors, changing the nature and potential harms of online influence, or introducing new safety risks in physical infrastructure. In this paper, we review recent progress in DRL, discuss how this may introduce novel and pressing issues for society, ethics, and governance, and highlight important avenues for future research to better understand DRLās societal implications.



This article appears in the special track on AI and Society.


</jats:p
Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation
Peer reviewedPublisher PD
Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space
We focus on the challenge of finding a diverse collection of quality
solutions on complex continuous domains. While quality diver-sity (QD)
algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are
designed to generate a diverse range of solutions, these algorithms require a
large number of evaluations for exploration of continuous spaces. Meanwhile,
variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are
among the best-performing derivative-free optimizers in single-objective
continuous domains. This paper proposes a new QD algorithm called Covariance
Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the
self-adaptation techniques of CMA-ES with archiving and mapping techniques for
maintaining diversity in QD. Results from experiments based on standard
continuous optimization benchmarks show that CMA-ME finds better-quality
solutions than MAP-Elites; similarly, results on the strategic game Hearthstone
show that CMA-ME finds both a higher overall quality and broader diversity of
strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles
the performance of MAP-Elites using standard QD performance metrics. These
results suggest that QD algorithms augmented by operators from state-of-the-art
optimization algorithms can yield high-performing methods for simultaneously
exploring and optimizing continuous search spaces, with significant
applications to design, testing, and reinforcement learning among other
domains.Comment: Accepted to GECCO 202
Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation
Deep reinforcement learning has the potential to train robots to perform
complex tasks in the real world without requiring accurate models of the robot
or its environment. A practical approach is to train agents in simulation, and
then transfer them to the real world. One popular method for achieving
transferability is to use domain randomisation, which involves randomly
perturbing various aspects of a simulated environment in order to make trained
agents robust to the reality gap. However, less work has gone into
understanding such agents - which are deployed in the real world - beyond task
performance. In this work we examine such agents, through qualitative and
quantitative comparisons between agents trained with and without visual domain
randomisation. We train agents for Fetch and Jaco robots on a visuomotor
control task and evaluate how well they generalise using different testing
conditions. Finally, we investigate the internals of the trained agents by
using a suite of interpretability techniques. Our results show that the primary
outcome of domain randomisation is more robust, entangled representations,
accompanied with larger weights with greater spatial structure; moreover, the
types of changes are heavily influenced by the task setup and presence of
additional proprioceptive inputs. Additionally, we demonstrate that our domain
randomised agents require higher sample complexity, can overfit and more
heavily rely on recurrent processing. Furthermore, even with an improved
saliency method introduced in this work, we show that qualitative studies may
not always correspond with quantitative measures, necessitating the combination
of inspection tools in order to provide sufficient insights into the behaviour
of trained agents